{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Text Categorization\n",
    "We will explore how to utilize graph kernels and graph classification techniques in order to enhance natural language processing tasks such as the task of text categorization. Text categorization is the problem of automatically assigning category labels to textual documents. This problem is prevalent in many applications, including automatic news classification and opinion mining in product reviews. We will formulate the problem of text categorization as a graph classification problem, and then use graph kernels and the SVM classifier to solve it.\n",
    "\n",
    "The following code reads the data from the disk, applies some pre-processing steps (e.g., stemming), and extracts the vocabulary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Vocabulary size:  7186\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import re\n",
    "from nltk.stem.porter import PorterStemmer\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "def load_file(filename):\n",
    "    labels = []\n",
    "    docs =[]\n",
    "\n",
    "    with open(filename, encoding='utf8', errors='ignore') as f:\n",
    "        for line in f:\n",
    "            content = line.split('\\t')\n",
    "            labels.append(content[0])\n",
    "            docs.append(content[1][:-1])\n",
    "    \n",
    "    return docs,labels  \n",
    "\n",
    "\n",
    "def clean_str(string):\n",
    "    string = re.sub(r\"[^A-Za-z0-9(),!?\\'\\`]\", \" \", string)     \n",
    "    string = re.sub(r\"\\'s\", \" \\'s\", string) \n",
    "    string = re.sub(r\"\\'ve\", \" \\'ve\", string) \n",
    "    string = re.sub(r\"n\\'t\", \" n\\'t\", string) \n",
    "    string = re.sub(r\"\\'re\", \" \\'re\", string) \n",
    "    string = re.sub(r\"\\'d\", \" \\'d\", string) \n",
    "    string = re.sub(r\"\\'ll\", \" \\'ll\", string) \n",
    "    string = re.sub(r\",\", \" , \", string) \n",
    "    string = re.sub(r\"!\", \" ! \", string) \n",
    "    string = re.sub(r\"\\(\", \" \\( \", string) \n",
    "    string = re.sub(r\"\\)\", \" \\) \", string) \n",
    "    string = re.sub(r\"\\?\", \" \\? \", string) \n",
    "    string = re.sub(r\"\\s{2,}\", \" \", string)\n",
    "    return string.strip().lower().split()\n",
    "\n",
    "\n",
    "def preprocessing(docs): \n",
    "    preprocessed_docs = []\n",
    "    n_sentences = 0\n",
    "    stemmer = PorterStemmer()\n",
    "\n",
    "    for doc in docs:\n",
    "        clean_doc = clean_str(doc)\n",
    "        preprocessed_docs.append([stemmer.stem(w) for w in clean_doc])\n",
    "    \n",
    "    return preprocessed_docs\n",
    "    \n",
    "    \n",
    "def get_vocab(train_docs, test_docs):\n",
    "    vocab = dict()\n",
    "    \n",
    "    for doc in train_docs:\n",
    "        for word in doc:\n",
    "            if word not in vocab:\n",
    "                vocab[word] = len(vocab)\n",
    "\n",
    "    for doc in test_docs:\n",
    "        for word in doc:\n",
    "            if word not in vocab:\n",
    "                vocab[word] = len(vocab)\n",
    "        \n",
    "    return vocab\n",
    "\n",
    "\n",
    "path_to_train_set = 'data/train_5500_coarse.label'\n",
    "path_to_test_set = 'data/TREC_10_coarse.label'\n",
    "\n",
    "# Read and pre-process train data\n",
    "train_data, y_train = load_file(path_to_train_set)\n",
    "train_data = preprocessing(train_data)\n",
    "\n",
    "# Read and pre-process test data\n",
    "test_data, y_test = load_file(path_to_test_set)\n",
    "test_data = preprocessing(test_data)\n",
    "\n",
    "# Extract vocabulary\n",
    "vocab = get_vocab(train_data, test_data)\n",
    "print(\"Vocabulary size: \", len(vocab))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will now transform the documents into graphs and perform graph classification. The function *create_graphs_of_words()* that is defined below transforms a list of documents into a list of graphs. We will use the function to generate the train and test graphs. We will set the size of the window to 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Example of graph-of-words representation of document\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAADnCAYAAAC9roUQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOydd1hUx/eH311Y6UVBQDpKBCzYFRu2RBPsie0H9thiSSzEr9GYqFETNUZTLNFYUGOPscQSW1QsKBZEY0FsVBUUFZCl7f39YdhIpLvsgs77PD6yd+fOnAt3Pzv3zJlzZJIkIRAIBALtINe1AQKBQPAmIURXIBAItIgQXYFAINAiQnQFAoFAiwjRFQgEAi2iX9Cb1tbWkqurq5ZMEQgEgteDc+fOJUqSVDmv9woUXVdXV86ePVs6VgkEAsFrikwmu5vfe8K9IBAIBFpEiK5AIBBoESG6AoFAoEWE6AoEAoEWEaIrEAgEWkSIrkAgEGgRIboCgUCgRQqM0xUIygKJKelsPRfDtXtPearMwtxQH087c3o2cMTK1EDX5gkExUKIrqDMcjH6MYuORHI0IgGA9CyV+j1D/XssOBhBa4/KjGzlTh0nS12ZKRAUCyG6gjLJupA7zNpzDWVWNnnl2Vf+I8D7r9znWEQiU/w86evjql0jBYISIHy6gjLHc8G9Slrmy4Ibs3gwaXfCAHhycjOJu38gLTObWXuusi7kTr59Tps2jb59+5ai1QJB0RCiKyhTXIx+zKw910jLVJH4xwKSjq3Nt61Fs15Y+X0MQFqmill7rhEe87hE47q6unLw4MESnSsQFAchuoIyxaIjkSizskt0rjIrm8VHIjVskUCgWYToCjTGnDlzcHBwwMzMDA8PDw4dOsS0adPo0aMHvXv3xszMjPr163Px4kX1OVevXqV169ZYWlri6VWD3X/sQpIgOWwfqVeO8DTkN6Lm9+DBlukvjfc4+FcSd32rfp0cfogVo/yoVMmKr7766qXZa0ZGBv3798fMzIyaNWuqM+j169ePqKgoOnfujKmpKXPnzi3F35LgTUeIrkAjXL9+nZ9++onQ0FCSk5P5888/ycnFvGPHDnr27MmjR4/w9/enW7duZGZmkpmZSefOnWnfvj0PHjzg3WGTid8+j8yHMZjVfReTGq0x9/kA5wlbsen5ZYHjZyRG8Wj/Eqp0m8hXW47z5MkTYmNjc7XZuXMnffr04fHjx3Tp0oXRo0cDsHbtWpydndm1axcpKSlMnDixVH5HAgEI0RVoCD09PdLT07ly5QqZmZm4urpSrVo1ABo0aECPHj1QKBSMHz8epVJJSEgIISEhpKSkMGnSJCpUqAD2NTGq1ojUK0eLPf6zaycwcm+M3N6LyMR0ZsyYgUwmy9WmRYsW+Pn5oaenR79+/XLNuAUCbSFEV6AR3N3dWbhwIdOmTcPGxoY+ffoQFxcHgJOTk7qdXC7H0dGRuLg44uLicHJyQi5/fhs+VWahb2FDdsrDYo+fnfIQfXPrf/rJxNjYGCsrq1xt7Ozs1D8bGxujVCrJysoq9lgCwasgRFegMfz9/Tl+/Dh3795FJpPxv//9D4Do6Gh1G5VKRUxMDPb29tjb2xMdHY1K9Tzm1txQn6ynCeiZ/iOW/5mpFoSeaSWykh/+04+CtLQ0Hj4sunj/d1YsEJQWQnQFGuH69escPnyY9PR0DA0NMTIyQk9PD4Bz586xbds2srKyWLhwIQYGBvj4+NCkSRNMTEyYO3cumZmZyOOvkBZ5BpMavgDomViS9fhekcY39mhOWuQZlFGXSIy8yMSJE5Hy2lWRD7a2tty6dav4Fy4QFBMhugKNkJ6ezqRJk7C2tsbOzo4HDx4we/ZsALp27cqmTZuoWLEia9euZdu2bSgUCipUqMDOnTvZu3cv1tbW7F46E7vO41FYPXdHmHq/Q2ZiFFELevPgt5kFjl+hsguV3h7Owz/ms3nGMJYvX46enh7BwcGkpqYWav9nn33GzJkzsbS05Ntvvy20vUBQUmQFzQYaNmwoicKUgldh2rRpREZGsm7dukLbbt26lXG/XUHPtT5Q/Md9mQw61LBlad+GxMfH4+joSIsWLbh48SKdOnXC39+fd955B4VCUYIrEQiKjkwmOydJUsO83hMzXYHOSU5OZvDgwXz22Wd8M6AtRoqSpQTJvBXKoCb2pKamMn36dLy9vTly5AgRERH4+Pgwc+ZMHBwcGDVqFCdOnFD7kgUCbSJEV6BTQkJCqFevHnK5nAsXLhDwbgum+HlipCjerWmkkOP+7CodGtXA3t6eGzdusHHjRmQyGTY2NowePZqTJ08SEhKCvb09Q4cOpWrVqkyePJnLly+X0tUJBC8j3AsCnZCVlcXs2bNZtGgRS5Ys4f3338/1fmFZxnKQAYYKvWJnGZMkifDwcH799Vc2bNhApUqV8Pf3p0+fPri4uJTsogSCfyjIvSBEV6B1bt++Td++fTE2Nmb16tU4ODjk2S485jGLj0Ty1/UEZPybzhHAUF9OVnY2svgrbJk+lLrOFUtsj0ql4vjx4/z666/89ttveHl5ERAQQI8ePbC2ti5xv4I3FyG6gjKBJEmsW7eO8ePH89lnnzF27Fj1xoiCeJiSztbzMVyLT+apMhNzQwWeVcz4oJ4D77VtSWBgIL1799aIjRkZGfz555/8+uuv7N27F19fX/z9/enSpQsmJiYaGUPw+iNEV6BzHj9+zEcffUR4eDjr16+nTp06Gun34MGDjBw5kr///lvjUQnJycls376d9evXc+rUKTp27EhAQICIgBAUioheEOiUo0ePUqdOHaytrTl79qzGBBfg7bffxtnZmVWrVmmszxzMzMzo168fe/fuJSIigqZNm4oICMErI2a6glIjIyODL7/8kqCgIH755Rf8/PxKZZzQ0FC6d+9OREQExsbGpTLGi9y6dYsNGzbw66+/8uzZM/z9/fH396dWrVqlPragfCDcCwKtc/36dQICAqhSpQorVqzAxsamVMfr0aMHjRs31mpaxv9GQFSsWJGAgACNRUCIKsjlFyG6Aq0hSRLLli3j888/Z8aMGYwYMUIryWSuXbtGy5YtiYiIoGLFkkcylBSVSkVwcDDr169n69at1KhRo8QREAVXQZYjgaiCXMYRoivQCgkJCQwZMoTo6Gh+/fVXvLy8tDr+kCFDsLGxUed80BUZGRns27eP9evXs3fvXlq2bElAQECRIiCKHJ8sA0P94scnC7SDWEgTlDr79u2jbt26eHp6EhISonXBBfjyyy/5+eefiY+P1/rYL1KhQgW6dOnCxo0biYmJoXfv3qxZswYHBwcCAgLYs2cPmZmZL523LuQOYz+dRMS3vYn6oeDKxZJEkaogC8oeYqYreCWUSiX/+9//+P333wkKCqJNmzY6tScwMJBnz56xePFindqRFw8ePGDz5s2sX7+eyMhIevbsib+/P02bNuVS7FM++HYXtxYPweGjlf+ktbxP7NIPcZ64A5lcL99+jRR6bBrmg7ejcDWUFcRMV1AqXLp0iUaNGhEfH09YWJjOBReep2jcvHkzN2/e1LUpL1FQDoiRi3eQ+jAeuZEZeibFE8/8qiCLqhhlFEmS8v3XoEEDSSD4L9nZ2dKCBQska2trafXq1ZJKpdK1SbmYMWOG5O/vr2sz8uXrr7+WqlatKpmamkpeXl7SV9/Mk+x6T5dk+hUkkEkyhaFkUqudpGdeWQIkmcJQkikMJbt+8ySXSX9IVu99LOlbOUpyAxPJ0K2e5PDRSqn653ukxGSlBEg//fST5O7uLrm6uur6Ut9YgLNSPrpashx6gjeWuLg4Bg4cSHJyMiEhIerik2WJcePG4e7uTlhYGHXr1tW1OS9RrVo1goODsbOzY8uWLfQfOAjH4cux6TmNxD/m4zgqCEDtXnAat0ntXngWcYonp7Zg0+ML9CvZ8+TUFhJ3zsV00HdsPR8DwPbt2zl9+jRGRkY6u0ZB/gj3gqDIbN++nfr169OsWTOCg4PLpOACmJqaMmXKFKZMmaJrU/KkZ8+e2NvbI5fL6d27N+a2TiRHXyvSuclh+zBv2hOFtRMyuR4WzXqRcf82KQ/vcS0+GXjuYqlUqZIQ3TKKmOkKCiU1NZVx48Zx8OBBtm3bRrNmzXRtUqEMGzaM7777jmPHjuHr66trc3KxZs0avvvuO+7cuQPA0+QUKtZ4ityg8N102U8ekHRwGUmHV7xwVCIr+SFPlc8jIl6sviwoewjRFRTI2bNnCQgIwMfHh7CwMMzNzXVtUpEwMDBgxowZfPbZZxw/frzMVPu9e/cuQ4cO5dChQzRt2hQ9PT0qu3qgIo8oojxs1jO3xrxZL0xrvrxoaW6o+Oe0snGtgrwR7gVBnmRnZ/P111/j5+fHjBkzCAoKKjeCm4O/vz9Pnz7ljz/+0LUpalJTU5HJZFSuXBmAVatW8Sj6Jgr5y0IpNzYHmTxXRWSzuu/x9NQWMhLuAqBSppJ67TiG+nI8q5hp5yIEr4SY6Qpe4u7du/Tv3x+ZTMa5c+fK7eOqnp4es2fPZvLkyfj5+alLwuuSGjVqMGHCBJo2bYpcLqd///40adqUvIq/yxWGWDTtxb21nyKpsrHtNR1jj2aoMpUk7pxL1pMHyA1MMHSti1TLlx71HRmh9SsSFBexOUKQiw0bNvDJJ58wYcIEAgMDy4RQvQqSJNGiRQtGjBhBv379dG1Ovgxbe5YDV+8XuPU3P16sgiwoGxS0OULMdAUAPHnyhNGjRxMaGsrevXtp0KCBrk3SCDKZjK+//poBAwbQq1cvDAzKZnauUa3dCb6RSFpmdrHPNdTXY2Rr91KwSlAaCJ+ugOPHj1O3bl1MTU05d+7cayO4Ofj6+uLl5cWyZct0bUq+1HGyLHEV5Cl+nmILcDlCiO4bTGZmJlOnTqVHjx58//33LFmy5LWtAzZ79mxmz55NSkqKrk3Jl74+rkzx88JIoUdh8Qcy2fOcC1P8vESWsXKGEN03lMjISFq0aEFoaChhYWF06dJF1yaVKnXr1qVNmzYsXLhQ16YUSF8fVzYN88HDRIlMlYWhfu6PqKG+HAN9OR1q2LJpmI8Q3HKIEN03DEmSWLlyJU2bNlWnGbSzs9O1WVphxowZLFy4kMTERF2bUiDejpZU/Hsr46o9ZNw71ele14HqphlUTrnNuHeqc/J/bVnat6FwKZRTRPTCG8TDhw8ZPnw4ERERrF+//o2s6TVy5EiMjY359ttvdW1KvqhUKmxtbTl37hzOzs4AnDlzho8++ohz587p2LrXg9IuhSQqR7xGlPRmOXToEAMHDqRnz57Mnj0bQ0NDLVpddoiPj6dWrVqEhYWV2fjj8+fP4+/vz7Vr/+ZjePLkCQ4ODjx9+hS5XDyglhRtlUISIWOvAQXfLPdYcDAiz5slPT2dKVOmsHHjRlauXEn79u21bntZokqVKgwfPpzp06fzyy+/6NqcPNm/f/9LfycLCwvMzc2JjY0ts18WZZ3CSiEp//lM7b9yn2MRiaVWCkl8ZZYD1oXcoc/yEA5cvU96liqX4MLzmyU9S8X+K/fpszxEXb7lypUrNGnShFu3bhEWFvbGC24OEydOZMeOHblmkmWJAwcO5Pm38vDwKLM2l3WeC+5V0jILrj0HpV8KSYhuGaekN8vQOUG0atWK0aNH89tvvxW7Iu3rjKWlJYGBgXz++ee6NuUlUlNTOXPmDK1atXrpPU9PTyG6JeBi9GNm7blGWqaK2J+HkvkoVv2eSplC4q75xPzUn+iFfXi0fyk5Lte0TBWz9lwjPOaxRu0RoluGefFmKQ5pmSrWLppHnw9HMWTIEJF1Kg/GjBnDqVOnCA0N1bUpuTh27Bj169fHzOzl5DVCdEvGoiORKLOe7/QzqtqQtJv/rlNlK1OoYFsN+yGLsR+yhGc3Q3l27bj6/fxKIb0KQnTLMC/eLMVGJiM8UdTIyg9jY2O++OILJk+erGtTcnHgwAHeeeedPN/z9PTk+vXrWraofJOYks7RiAT1U6JRtYak3fpXdBWWdpg37obc0BQ904ooKtmTnfrvzFaS4K/rCTxMSdeYTUJ0yyj/vVleRFIVQYgluH4vWaM3y+vG4MGDuXPnDgcPHtS1KWryWkTLQcx0i8/WczG5Xhs6e5MRH4kqQ/lS29Rrx8mIv4FxdZ9cx2WgLoWkCYToliHOnz9PvXr1MDMzo0OnbsRt+5qkY2tR3g0nZtEAnoRsJfrHvjzcvZBsZQoPtkwn+nt/ohf05sGW6WQ9zR30n5kUT4NGjbGwsKBr1648evRIR1dWNlEoFMycOZPPPvuMgkIntUVcXBxxcXH55r5wcnIiKSmJ5ORkLVtWfrl272muhWeZvgIDRy+Udy/maqeM/ptHe3+k8gdT0TevnPu9LJW6FJImEKJbRsjIyKB79+4MHDiQR48eUaVBO1KunVK/n52ShCotGcePVlHp3dEgqTCp/TYOI1fiMHIVMv0KPDqwNFefT8IP0fLDqcTFxaGvr8/HH3+s7csq8/Ts2ZPs7Gy2bduma1M4cOAA7dq1yzedplwup3r16sLFUAyeKl92sRlVa0jazdy+/OQLezBr1A1Dp5r59JOpMZuE6JYRQkJCyMrK4uOPP0ahUGDj3QoD++r/NpDJsWwZgExfgVxhgJ6ROSaezZErDJEbGGPRrDfpUZdy9WlSqy0VKrtgYmLCV199xebNm8nOLqGP+DVFLpfz9ddfM2XKFLKydOsDzy9U7EWEi6F4mBu+vBXBqFoj0m7l3tmXnfIIPbNKBfSj0JhNYnNEGSEuLg4HBwd1pIGRHuiZ/RvmpWdsjky/gvq1KlNJ0qFfSLt1DpXyeeYsKSMNSZWtLtetb2atvllcXFzIzMwkMTERW1tbbV1WuaB9+/ZUqVKFoKAgPvzwQ53YoFKpOHDgADNnziywnRDdopOdnY2UFItMlYUk/1c09c0rIzcwJiPhDhUquwJQufvkXJ+vF9F0KSQx0y0jGBkZcfPmTcaPH0+jRo3YvOw7sp8++LfBf8K+np75ncyHMVTp/x3O47dgFzDn5U5TH6pvlqioKBQKhYjXzYOcROfTpk0jLS1NJzaEh4djYWGBq6trge3EBonCefjwIXPnzuWtt97ixNpv0dPLY7b7n9CxxF3f5goVexEJ6FHfUWP2CdHVEdHR0axfv56PPvqImjVr0q9fP549e0ZERARz585lztAupMfnHx8oZaQhUxggNzQhOy2Zx8fXv9TmSfhhIv9cQ3x8PF988QU9evQo9+V3SgsfHx8aNGjA4sWLdTJ+QaFiLyJmuvkTGhrKwIEDcXd358qVK2zatImzJ47Q1svupcLKz/26/4quba/pmNZu91KfMhm08aiskSQ4OQjR1QKSJBEREcEvv/zCgAEDcHNzo379+mzdupXq1auzZs0aHj16RHBwMDExMXTu3Jm9O7fhWq8F8jy+pQHMGnZFykwn+nt/7q2ZgFHV/6x4y6Ba47asWPIj9vb2nD59mgEDBpSJVfqyyqxZs5gzZw5PnjzR+tgFhYq9SPXq1bl586bwzf+DUqkkKCiIxo0b06tXL2rUqMGNGzdYvXo1jRo1Ap6XQjLUzz3ZMHCsgZFb/UL7L41SSCLLWCmQnZ1NeHg4wcHBHDt2jODgYAwMDPD19aVly5b4+vri6elZ6E6x2vUakujki0HNl7+BC8NIocemYT54O1py//59goKCWLZsGSYmJgwbNoyAgAAsLUU+1v8ycOBAnJyc+Oqrr7Q2ZlpaGjY2NsTExGBhYVFoe1dXVw4dOkS1atW0YF3Z5Pbt2yxdupRVq1bRoEEDRo0axXvvvZfvk9y/2+mLvrvzeSmkklXmKCjLmJjpaoD09HROnDjBN998g5+fH1ZWVvj7+3P58mW6devGmTNniIqKYt26dQwfPhwvL688Bffo0aPcu3ePrKwsgoKCiLz2N1OG9Xnlulm2trZMnDiRiIgIFixYQHBwMG5ubgwaNIhTp06J2e8LTJs2jcWLF3P//n2tjRkcHEydOnWKJLjw5roYVCoV+/bto3PnzjRq1IisrCxOnjzJ3r176dSpU4Gus1ylkArZFV/apZBE9EIJSElJ4dSpU+qZ7NmzZ/Hw8KBly5YMGTKE1atXY2NjU+x+r1+/Tq9evUhJSaFatWps3bqVjh0bY2ldcEq6HGSy549D+aWkk8vltG3blrZt25KQkEBQUBADBgygQoUKDBs2jH79+lGxYsVi2/064erqSr9+/Zg1axY//PCDVsYsSqjYi+SIbseOHUvRqrLDo0ePWLVqFUuWLMHc3JxRo0axadMmjI2Ni9VPXx9XvB0tWXwkkr+uJyDj33SO8G8+3TYelRnZ2r3UKnMI90IRePjwIcePH1eL7JUrV6hXr57aVdCsWTPMzc1L1YbwmMcsPhLJ4WsPSE9XItP/17H/KjeLJEkcPXqUZcuWsWfPHrp06cKwYcNo3rz5G5so58GDB3h5eXH27Fnc3NxKfbw6deqwdOlSmjZtWqT2S5cu5dy5cyxfvryULdMt58+fZ9GiRWzbto2OHTsyatQofHx8NHJfPkxJZ+v5GI5djOTc5au817Y1nlXM6FFfVI7QCTExMQQHB6tFNioqiqZNm6pFtnHjxjqrvLBm0zYW7TlHM7+ePFVmYm6o0NjNkpiYyNq1a1m2bBkymUw9+7WystKQ9eWHadOmcevWLdasWVOq49y7dw8vLy8SEhLQ1y/ag+eRI0eYOnUqwcHBpWqbLkhPT2fLli0sXryY2NhYRowYwYcffliiJ8eicPLkSSZMmMCpU6cKb1wMChJdJEnK91+DBg2k1x2VSiVdv35d+uWXX6QBAwZIbm5ukpWVldStWzdp/vz5UmhoqJSZmalrM9VMnDhRmj59eqmOoVKppGPHjkl9+/aVLCwspICAAOno0aOSSqUq1XHLEk+ePJFsbGyk8PDwUh1n7dq1Uvfu3Yt1TlxcnGRtbV1KFumGu3fvSp999plkY2MjvfPOO9L27du18rk7c+aM1LBhQ433C5yV8tHVN86nm52dzaVLl3JFFigUCnVkwcSJE/H09CyzdahOnz5d6ukIZTIZLVu2pGXLljx69Ii1a9fy0UcfkZ2dzbBhw+jfv/9rv8nC3NycSZMmMWXKFHbu3Flq4xQ1VOxF7OzsyMjIIDExsVz/HVQqFYcOHWLRokUEBwfTr18/jh07hoeHh9Zs0NfX1/727/zUWNLQTDchWSktORIpfbLxvDRo9Rnpk43npSVHIqXEZOUr910U0tPTpRMnTkjffPON5OfnJ1laWkoeHh7SkCFDpDVr1ki3b98uNzO4rKwsydTUVEpKStL62CqVSjp+/LjUv39/ycLCQurTp490+PDhcvO7KwlpaWmSk5OTdPz48VLpX6VSSXZ2dlJkZGSxz23cuHGp2VXaJCUlSQsXLpSqV68ueXt7Sz///LOUkpKiE1vCw8OlmjVrarxfdDHTLWkhxVclNTU1V2RBaGgo1atXp2XLlgwePJiVK1eW29wDf//9Nw4ODjqJr5XJZDRv3pzmzZuTlJTEunXr+Pjjj0lPT2fo0KEMGDCg1PxuusLQ0JDp06czadIkjh07pvGFxcuXL2NsbFyieNuchObNmzfXqE2lSXh4OIsWLWLz5s28++67rFixQucLtgqFQusz3VIRXW1W3Xz06FGuyILLly+rIwsmTpxIs2bNihz/WNYJCQmhSZMmujaDihUrMmbMGEaPHs3p06dZtmwZHh4etG/fnmHDhtGmTZsy654pLv369WPevHns3bsXPz8/jfZd3FCxFykvsboZGRls27aNRYsWcfv2bYYPH87Vq1exs7PTtWmAbtwLGhfdF3d+3P2mE/bDl6GoaJ9n2xcLKQJFEt7Y2NhckQV3797Fx8eHli1bMnfuXBo3boyRkZEmL6nMcPr06TIhujnIZDJ8fHzw8fHhu+++Y/369YwfP57U1FSGDh3KwIEDy+1TRQ76+vrMmjWLyZMn8+6772r0y2T//v0MGzasROd6enqyatUqjdmiaWJiYli2bBnLly/Hy8uLsWPH0rVr1yJHaGgLfX19MjM1lyu3KGh0OvIqhRT/W3XT1dWVAwcOEBkZycqVKxk0aBDVqlXD29ubjRs3UrVqVVauXMnDhw/Zv38/U6dOpVWrVq+t4ELZE90XsbS0ZOTIkYSFhbF+/Xpu3LiBp6cnPXv25MCBA6hUxbsnyhLdunXDwMCATZs2aaxPpVLJyZMnadu2bYnOL4szXUmSOHz4MB988AHe3t4kJSVx+PBh9bGyJrjwGrgXXqWQojIrm0V/RTKqTgWOHTtGQkICvXv3xtjYGF9fX3x9fQkMDMTLy+u1eXQtDk+fPuX27dt4e3vr2pQCkclkNG7cmMaNG6tnvxMnTuTJkycMGTKEQYMGUaVKFV2bWSxkMhnffPMNQ4cO5YMPPqBChbzzrhaHEydOULNmzRL756tVq0ZUVBTp6ekYGGguA1ZJePr0KWvWrGHx4sXI5XJGjRrF6tWr86xoXNbQhXuhROq1atUqOnfurH7t7u5O1/d7qAspxiwaSMb9WwAo74QR+/NQohf05uH+Jep9/plJ8dxbP5nohf9H9Pf+PNgxjz1nrtEjYCALFy4kLS2NZ8+e8fjxY+rWrcuIESOoWbPmGym4AGfPnqVu3booFJrLYF/amJubM2LECM6fP8/mzZu5c+cONWrU4P3332ffvn3lKlNWmzZtqFatGitWrNBIfyUJFXuRChUq4OLiws2bNzViT0n4+++/GTlyJK6urhw7dowlS5Zw6dIlPvroo3IhuFCO3AutWrUiODgYlUpFfHw8mZmZHA1+ngA48/E9pIw0FDauAKRFhlJlwAKqDP6RZ1eDUd4+/08vEhZNe+I4eg32Q5eQ/TSRlNNbmbBoC5GRkTg7O/PHH3+QkpLCxIkTNXGt5Zqy7FooDJlMRsOGDVm2bBlRUVG89957TJ06lWrVqjFz5kzi4uJ0bWKRmD17Nl999RWpqamv3FdR8+cWhC4SmmdmZrJlyxZat27NO++8g42NDZcvX2bz5s20atWq3G0d14V7oUSiW7VqVczMzAgLC+Po0aN06NABQwtrUu7fJT3qEgZONZHJnndt7hyp6YgAACAASURBVNMDuaEp+hY2GLp4q2fAior2GLnVQ6avQM/YAvPG3Ui9e0mjVTdfJ06fPo2Pj0/hDcs4ZmZmDB06lNDQULZt20ZsbCy1atWiW7du7N69u0zPfhs0aECLFi1eORHOgwcPuHXr1it/iWrTrxsfH8/06dNxdXXlp59+YuTIkdy9e5dp06Zhb5/3Qnl5oNy4F+D5bPfIkSMcO3aMVq1aUbl6PZRRl1FGXcbQqZa6nZ7pv1mrZPoGqDKf15vPTn1Mwo45xPzUn6jvepK4az6qtKcarbr5uiBJUrme6eZH/fr1WbJkCVFRUXTu3JkZM2bg5ubG9OnTiY6O1rV5efLVV18xf/78Vypnf+jQIVq3bv3KrqKcWN3SQpIkjh07Ru/evalZsyb37t1j3759HD16lF69epUrV1d+lBv3AvwrusHBwbRq1YqqtRuhjL6MMvoyBs61Cz0/6WgQIKPKhz/hPH4L1p0nAJK6kGJ5e0wpTaKiopAkCWdnZ12bUiqYmpry4Ycfcvr0aXbu3MmDBw+oU6cOnTt3ZteuXTqv0vsiHh4evP/++8yZk0dNuiKyf//+V3YtQOnNdFNSUliyZAne3t4MHz6cFi1acPv2bZYsWULt2oV/tssTOTNdKa8NBaXEK4nuX3/9RVpaGo6OjrRu5Yvy1jlUaU+pYFu10POljDTkFQyRG5iQlZzI09PbkIG6kKKtrS23bt0qqXmvFTmz3Dfhi6hu3bosWrSI6Oho3n//fWbPno2rqytffvklUVFRujYPgC+++ILly5cTGxtb7HMlSXqlTREvkuPT1ZRgXL16lTFjxuDs7MyBAwdYuHAhV65cYcyYMa/NBqP/IpfLkcvlWg1pLLHoVq9eHVNTU1q2bAlAP18vFBXtMHCsoS4BXhAWzf+PjHs3iV7QmwdbpmNcvSmSJPH3HyuJi4vjs88+Y+bMmVhaWvLtt9+W1MzXgtfRtVAYJiYm6soWe/fuJSkpiXr16tGxY0d27Nih09mvo6MjQ4YMKVFJn6tXr6Kvr4+7+6vX3bKyssLAwIB79+6VuI+srCy2bdtGu3btaNOmDRYWFly8eFF97E34ote2i0Gj+XSHrT3Lgav3C6xukK8hMmjhYkbFv7eybt06PvjgAz799FOqV69e/M5eM1q0aMH06dNp1674tdJeJ549e8bWrVtZtmwZt2/fZtCgQQwZMqTQsuWlwaNHj6hevTqnTp3irbfeKvJ5ObPHZcuWacQOX19fpk+fTps2bYp13v3791m+fDk///wzLi4ujBo1SmMxyOUNU1NT7t27h6mpqcb61FqNtLyqbhYVQ309Pu3ozQ8//EBERAT29vY0b96cHj16EBoaqkkzyxWZmZmEhYWpK5u+yRgbG9O/f3+OHz/O/v37SUlJoWHDhrz77rts27ZNq7OVSpUqMX78eKZOnVqs8zQRKvYixfHrSpLEiRMn8Pf3x9PTk6ioKHbt2sXx48f5v//7vzdScEH7EQwaFd06TpZM8fN85UKK1tbWTJ8+ndu3b9OyZUs++OAD2rVrx/79+9+4IoqXLl3CxcWl1MsBlTdq1qzJwoULiY6Opm/fvixcuBBnZ2cmT56stbWATz75hGPHjnH+/PnCG/O8KkJwcLBGn1iKIrqpqaksX76cevXqMWjQIBo3bszt27dZtmwZdevW1Zgt5RVtuxc0vr1Lk1U3TU1N+eSTT7h58yYDBgxg/PjxNGjQgE2bNpWpFe3S5E305xYHIyMj+vbty7Fjxzh8+DBKpZImTZrQvn17tm7dSkZGRqmNbWJiwpQpU4qcVP7UqVN4eXlRqVIljdlQ0AaJiIgIxo0bp95oNHfuXK5du8bYsWN1kh60rKLtDRKlsqe2r48rm4b50KGGLQb6cgz1cw9jqC/HQF9Ohxq2bBrmU2h2MYVCQf/+/QkPD2f69On8+OOPeHh4sHTpUtLS0krjEsoMr8umCG3g5eXFd999R3R0NAMHDmTRokU4OzszadIkIiMjS2XMoUOHEhERwV9//VVoW02Fir3If2e62dnZ7Nixg/bt29OiRQsMDQ05f/68+tibuo2+ILS+QSK/7OaShipHJCYrpaVHI6WxGy9Ig1efkcZuvCAtPfrqlSOCg4OlTp06SXZ2dtKsWbN0Uk1BG3h4eEhhYWG6NqPccu3aNSkwMFCqXLmy1K5dO2njxo2SUqnZqiXr1q2TmjRpUmgVjYYNG0pHjx7V6NhZWVmSoaGhdOfOHWn27NmSs7Oz1KRJE2nNmjVSWlqaRsd6XXF1dZVu3ryp0T4poHJEuS9MeenSJalfv35SpUqVpMDAQCk2NlbXJmmMR48eSaampmWqMGZ5RalUShs3bpTatm0rVa5cWQoMDJSuX7+ukb6zs7Mlb29v6ffff8+3TUJCgmRubi6lp6drZExJel7u59SpU5KlpaVkamoqDR48WDp79qzG+n9TeOuttzR2L+RQkOiW+2eNWrVqsWbNGi5cuEBmZia1atViyJAhpbo9UluEhoZSv379MpmHtLxhYGBA7969OXToECdOnEAul9OyZUvatGnDhg0bSE9PL3Hfcrmc2bNnM3ny5HxzRxw6dAhfX1+NRAikpaWxcuVKGjZsSEBAAM7OzixYsIAVK1bQoEGDV+7/TaNcRy/oEmdnZxYuXMiNGzdwcnJSRz2cOXNG16aVGLGIVjq89dZbzJkzh+joaEaOHMnKlStxdHRkwoQJJd5W6+fnh5WVFWvXrgUgMSWdpUdvMnbTBQYHhbIg5BGWTXvwMKXk4n7z5k0CAwNxdnZm27ZtzJw5kxs3btClSxdiYmJK3O+bTrmPXtA1VlZWfPnll9y+fZtWrVrRs2dP2rRpw59//lnuws2E6JYuFSpUUFe2CAkJoUKFCrRp04ZWrVrx66+/olQqi9yXTCbj66+/ZtoPKxkSdIbmcw6z4GAE28PiOHztAfcMnTn9rDLN5hxm+LqzXIx+XHinPF8Y2717N35+fvj4+CCXyzl9+jR//PEH7733HnK5vExWkShPvBbRC2UBExMTPv74YyIjIxk8eDCBgYHUr1+fjRs3lotwM+k1zSxWVqlWrRpff/01UVFRfPLJJ6xduxZHR0fGjRvHlStXitTHHX1HZO+M5+C1B6RnqXJVwAbIyJZIz1Kx/8p9+iwPYV3InXz7evjwIfPmzeOtt95i2rRp9OrVi6ioKObOnUvVqrlzmwjRfTWEe0HDKBQK+vXrR3h4ODNnzmTRokVUr16dJUuWlOlws9u3b1OhQgUcHR11bcobhUKhUFe2CA0NxcTEhLfffpsWLVqwZs2afO+ZnIKsklwBFByg/mJB1v8K79mzZxk0aBDu7u5cvnyZjRs3EhoaysCBA/Ot/+fh4UFERES5rkOnS4R7oZSQyWR07NiR4OBg1q5dy969e3Fzc2PWrFkkJSXp2ryXELNc3ePm5sbMmTO5e/cugYGBbNy4EUdHRz7++GMuX76sbpdTkDU5IZ6733RCUhUtEXtOQdbQWw8ICgqiSZMm9OjRA09PT27cuEFQUBCNGzcutB9TU1MqVapUZrKwlTeEe0ELNG/enJ07d3Lo0CEiIiJwd3cnMDCwRKn6SouQkBAhumUEhUJBt27d2LNnD+fPn8fS0pIOHTrQrFkzrK2t+d8Pa0tckDUtI4vuk5ewYcMGPv/8c27evMn//vc/rK2ti9WPcDGUHOFe0CI1a9YkKCiICxcukJ2dTe3atfnwww/LxM0rdqKVTVxcXJgxYwZ3795l0qRJpD5L48LdhyXKrAeATIaxeyN+3bqDzp07o6dXsoRRpV1F4nVGuBd0QE6c440bN3BxccHX15f333+f06dP68Se9PR0Ll26JGIuyzD6+vps2bKFdGUaCb9/TdT8HqReCwYg9e8jxCweRPT3/jw5uUl9jiSpeHJqC7FLhxC98P9I2P4N2WnJyGUytp5/tZAvMdMtOcK9oEOsrKz44osvuH37Nm3atKF37960bt2affv2aTXc7OLFi7i7u2s0v6dA86xduxZTKzsq9/gC5wlbMfF8ntA/PeZv7IcuxbbPTB6f2EBm4vN6b8lnd/HsRgi2/t/gOHoNckNTHu1fgjJL9coFWYXolhzhXigDmJiYMGbMGG7cuMGQIUP49NNPqVevHhs2bNDKH0csopVtMjIyuHv3LqdOnSIzD1+uRXN/5AoDKthWpYKNGxkPnqeaTA7bi6VvP/TNrZHpK7Bo4c+z6yeQVNmvXJBViG7J0bZ7QewvLQCFQkHfvn0JCAhg7969fPPNN0yZMoUJEyYwaNAgjI2NS2Xc06dPF7sSgODVUalUJCQkEBsbS1xcHHFxceqfXzz2+PFj7OzssLe3R5X98oc13wrYTxJI2DYLZC/MdWRyslOTSHqgIiLCgmrVqpXIr+vg4EBKSgqPHz8WaRuLibbdC0J0i4BMJsPPzw8/Pz9OnjzJnDlz+Oqrrxg9ejSjRo2iYsWKhXdSDE6fPs2kSZM02uebjCRJPH36tFAxvXfvHpaWltjb2+Pg4KD+v1GjRnTt2lV9rHLlyuoUidZVnFDIi1ZHTM/cGiu/TzB0rJH7ONnEXz1Lh8XjSEhIoGbNmtSpU0f9z9vbu9Ak9jKZDA8PD65fvy6ekoqJtt0LQnSLSbNmzdixYwdXrlxh3rx5VKtWjUGDBjFu3LgSb2RITEln67kYrt17ysOnz0ip/QHBCQZUSUnHytRAw1fweqFUKl8S0rxEVU9PTy2aOWL61ltv0apVK/XxKlWqFDshjaujPdGP76HvXKfQtmZ13+Px0TVYdxqPvoUN2c+ekB5zlYo1mvHHD1OwMp3BkydPCA8PJzw8nIsXL7JmzRouX76MjY1NLiGuU6cObm5uufLj5iQ0F6JbPIR7oZxQo0YNVq1aRXR0NAsWLMDb25tu3brx6aef4uXlVaQ+LkY/ZtGRSI5GJACot40aeLTk+8ORLDwcSWuPyoxs5U4dpzfrkTE7O5v79+8XKqapqalUqVIll5ja29tTt25d9TF7e3vMzMxKxc6pn08mYPBwEg6txKJZ7wLbmjXqAkjc3zSV7JRH6BlbYOLVkjbdu6q/XC0sLGjZsqW6ynbO7yIyMlItxKtWreLixYskJSVRu3ZttQibmJgQHh5eKtf5OqNt94JGqwG/yTx69IhFixbx008/0bRpUyZNmlRgnO3zbaPXUGZlFxjjKZM9L9o5xc+z0Aob5QFJkkhKSsr3ET/n54SEBKysrHIJaV4/W1lZ6bxM+MXox/RZHkJaZvE3SBgp9Ng0zEddH7A4PHr0KNes+MiRI9y5cwc3N7eXZsUuLi46/z2VVUaNGkWNGjUYNWqUxvosqBqwEF0N8+zZM1auXMn8+fPVpWLefffdXDd8zj79tMyi75V/Xrwz71pyZYVnz57lKaD//dnQ0DBfMc3539bWFoVCoetLKjIl+Zsa6MmY2qmGxv6mly5donfv3mzdulUtxDn/UlJS8Pb2ziXEtWrVKrXF4PLEJ598gpubG2PHjtVYn0J0dUBWVhabN29m+vTp6j3xs2fPpm2PQa80K2qdcoTkBzGsW7dO0ybnS2ZmJvfu3StwESo2Npb09PQ8BfS/P7+uH/QiP70AUlY6KcFrOb76azw9PTUyvlKpxNLSkuTk5Je+sBITE18S4mvXruHs7PzSrNjR0fGNmhUHBgZiZ2dHYGCgxvosSHSFT7eU0NfXx9/fn4MHD1KjRg2SkpL44Ycf2P3UAWWmSYn6VGZlE3rnEZ4a0ixJkkhMTCxwVhobG8ujR4+oXLnyS2LaunXrXMcsLS3fqA/rf+nr44q3oyWLj0Ty1/UEZIDyhfSOhvpyJKCNR2X6eFeiV9AImjVrxtmzZ19K11gSDA0NcXBw4Pbt21SvXj3Xe9bW1rRt25a2bduqj2VmZnLt2jW1GP/4449cvHiRjIyMl2bFNWrUyDfLWXlHRC+8ZkRFRdGnTx+GDBnCn0dPMmJPAlIJt6RIEtxOTMXNvvBH2OTk5EIXoeLj4zEzM8tzEapjx47qYzY2NiXOCfCm4e1oydK+DXmYks7W8zFci0/mt1176NDWl4buVehR31G9aBYcHEzjxo1p2rQpZ8+excnJ6ZXHz9kk8V/RzQuFQkHt2rWpXbs2AQEB6uP3799XC/Fff/2lrsjyX1+xt7c39vb25f6LVkQvvEa0bduWo0ePcvz4ccaOHcvY7zfycO8PpEaeRaYwwLROByya9UImkxOzeBCV35+CgZ07KZf/4uEf86kyZDEVrJ1JvvgnaZGh2HzwOTIg8t5jTp48WeDjfnZ29kszUzc3N5o3b64+VqVKFQwNDXX9a3otsTI1YLhvNQDO/DCKvtWa0+qf1zl4eHiwd+9e2rdvT/PmzTlz5gx2dnavNG6O6Hbp0qXEfdja2vLOO+/kKhefnp7OtWvX1K6J+fPnc/HiRSRJesk94eXlhYFB+Ql1FJsjXiMOHz5M69at6du3L0OGDMHLtxNZaak4jPgFVVoy9zdNRc+0EmZ12mPoVIv0u+EY2LmTHn0ZfUs70qMuU8HamfSoyxg61QIgSyURcTeGCRMm5Jqh1qxZM5fAmpubl/sZyOuCq6srd+/ezfO9Zs2asXbtWgYMGICvry8nT54sdlrHF/H09OTUqVMlPj8/DAwM1KKagyRJxMfHq2fFf/75J3PnzuXWrVu4u7u/NCt+1S8UTZMTHx+cWZW0bHiy6QKedub0bOBYqvHxQnS1RHZ2NhGn9mM78HvkBsbIDYwxb9yd1MuHMavTHgPn2qTdCMG8yfsoY/7GvGlPlHfCMKvvhzL6MmaNuqr7quLgyKmTe3V4NYLi4OLikq/oAnTv3p3Y2Fi+/PJL2rVrx9GjR0u8ldfDw4NVq1aV1NRiIZPJ1F/87777rvq4UqnkypUr6lnxnj17uHjxIgqF4iUh9vLy0nqUykvx8dmVAIgNi8NQ/x4LDkaUany8EF0tkZiYiCorE31zG/UxfQsbslMeAmDoXIvHh1eQnZIEKhUmni15cnwDWY/vo0p/RgXbfxdaFHoiT1F5wsXFhdDQ0ALbjB49mqioKNatW0eHDh04ePBgiTZ05LgXJEnS2ZOOoaEh9evXp379+upjkiQRGxurFuJdu3apq3J4eHi8JMaVK1cuFdsKizDJWfjcf+U+xyISSyU+XoiulrC2tkZPX4E8NREqPN8unPU0AT1TKwAUFe2RKQx4em4nhk61kBsYo2dSkeSL+zB0rIHsnyQp+nIZ5obiz1aecHFxYcuWLYW2++abb4iJieHUqVN07tyZPXv2FDu8LkesEhMTS024SoJMJsPR0RFHR0c6duyoPv7s2TP+/vtvtRhv376d8PBwjI2NXxJiDw8P9PVLfu8XJZZaeTecxD/m4zgqSF3HDtCo8IpPr5bQ09Oj+wcfsPdIEJX8xqFSpvD0zHbMm3RXtzFwrk3yuT+o1P4jAAz/eW3RvI+6jQQ4V3o941xfVwpzL+Qgl8tZtWoV7777LvHx8XTv3p2dO3cWa1FKJpOpZ7tlSXTzw9jYmEaNGtGoUSP1MUmSiIqKUgvxb7/9xhdffEFsbCxeXl65hLhOnTpUqlSp0HFy6tgVZ/MK/FvHztvRskS7BvNCiK4WWbZkMY07+XNr6RBk+hUwrdMBU+9/V4gNnWrx7MpR9aKZgXMtnp7Zpn4tk4GbtQkG+s90Yr+gZDg7OxMdHY1KpcqVoCYvDAwM+P3332nRogXx8fH06dOHzZs3F8vvmSO6L+ZvKE/IZDJcXFxwcXHJFYWRkpLC5cuX1WK8efNmLl26hIWFxUuz4rfeeitXmOOiI5ElrmOnzMpm8ZFIlvbNc69DsRGiW8ocOXJE/XPFihXZunFDvjvSzOq9h1m999Svjd0b4zLpD/VrQ309Vv4wV2PfuALtYGJigrm5Offv36dKlSqFtre0tGTfvn00a9aMO3fu0L9/f9atW1fkWOnXNaF5rVq1GDVqFGvXruXmzZv06dOHrVu30qdPHw4dOsSVK1cICwvj6tWrxMbGIpfLUalUODi5kN1uPLJKz+OgYxYPxqxBJ1IvHybr6QOM3Bpg3WkcMv2XM8w9PbuTlAt72R8wi4fdanPqyAE+//xz7ty5Q40aNVi6dCne3t7MmzePkJAQfvvtt0KvQ6zIaJk6TpZM8fPESFG8X/3z3AueQnDLKUV1MeTg6OjI7t27iY2NJSIigqFDh6JSFe3R+HUVXYDffvuNAwcOEBERwa5du+jYsSMLFy7k8ePHuLq60rhxY/bt24e+vj5ff/013377Lfp27sRtno70QsL5Z9eCsek1HYcRK8hIuE3KpYMvjfX4xAZSLx3CNuAbKphb893GfQwePJiff/6Zhw8fMnz4cLp06UJ6ejp9+/Zl3759PH78uNBrEKKrA/r6uDLFzwsjhR6FLTDLZM9zLpT1ZDeCgimu6ALUrl2bTZs2ER0dzYULF/j444+LVKvvdRbdMWPGYGtri4ODAy1btqRJkybUq1cPAwMDunfvzoULF9i0aRMdO3Zk7NixjBkzhvdGfYWUlUF6zFV1P2YNuqBvZoWekRnG7o3JuH/r30EkiUeHlqO8fQHb/5uNnrEFyiwVuzatY/jw4TRp0gQ9PT0GDBiAgYEBISEhVKlSBV9f3yItmArR1RF9fVzZNMyHDjVsMdCXY6if+09hqC/HQF9Ohxq2bBrmIwS3nFMS0QVo06YNCxcuJDExkeDgYCZOnFio8Lq5uREbG4tSqSypuWUWW1tb9c9GRkYvvU5JSSEuLg4XFxf18eQMFXrmlcn6JzwTXi6pJGX++7tSpaeSEvYn5j49kRv+mycl6UEc8+fPx9LSUv0vOjqauLg4AAYMGFCkRFTCp6tD8tqn/1SZibmhAs8qZrn26QvKNy4uLiWeffr7+xMTE0NQUBB79+7FxMSEadOm5dteoVDg5ubGjRs3qF27dgktLr/Y29tz6dIl9WszAz2ynyag/094ZmHIDU2x7jSBhB1zkL8/RV1eybJyFUb08mPKlCl5ntetWzc++ugjLl++XHD/RbwOQSmSs09/Qe+6rBjQiAW96zLct5oQ3NeIks50c/j0009p27YtFStWZOPGjcydO7fA9q+zi6EwevXqxe7duzl06BCZmZnEHN2CXL8CBo5Fq+gCYOjijXXnQBK2zSI97jqG+nI69+rL0qVLOX36NJIkkZqayu7du0lOTn5+jqEhPXr0wN/fv8C+hegKBFqgoPwLRUEmk7Fw4UKsra2pUaMGP//8Mz/99FO+7T09Pbl+/XqJxyvPeHh4sG7dOsaMGYO1tTXxl45j3/tLZHrF225s5FYPK7+xPNj6FWnxN5jg/x7Lly9n9OjRVKxYEXd3d1avXp3rnAEDBuSaZeeFSGIuEGiBx48f4+TkxNOnT19pe25aWhrt2rWjbt267N69my+++IIPP/zwpXZBQUEcOHBAq8nuyzLD1p7lwNX7BSaXzw+ZDDrUsC1SnG5UVBSenp6kpaXlm8RczHQFAi1gaWmJXC4nKSnplfoxMjJi586dHDp0iEGDBvHFF1+wfv36l9pVcavO+TQrxm66wOCgUMZuusDSozd5mJL+SuOXV0a1dsdQv2Q5oQ319RjZ2r3QdiqViu+++44+ffoU2E4spAkEWiLHr1uUbasFYW1tzd69e2nRogWTJk1iwoQJGBkZ0b17d3UGrSPXk0h3a8X2sDj1edrIoFVWyYmP/2r3FdKzij7dLWp8fGpqKra2tri4uLBv374CM70J0RUItISLiwt37tyhXr16r9xX1apV2blzJ++99x7z5s1jxIgRnH5UgR139dQZtGSK3Aux2sigVZb5v0ZOzJ//Hdnu75Atk2u0CreJiQkpKSlFskO4FwQCLfGqEQwAq1evpkWLFgA0bNiQoKAgJk2ahP8XS9hwNZ07v35OcvihAvuQJNQZtNaF3Hkle8oT33//PWb3w9jyUXOdxseLma5AoCU0Ibr/xc/Pj1FT57DijgyZogK2vaYX+dzSyKBVVrl69SqzZ8/m9OnTVHOuxNK+lXQWHy9EVyDQEq6urqVSSifavCbo3yvRuZrOoFUWycrKYsCAAcycOZNq1f6tU/diHTttItwLAoGWKO5MNzo6mvfff5/KlStjZWXF6NGj1e8FBgZSsWJFXFxd2bNnD/A8DO3er5NIvvgnACnhB7m3biJJh1cQvaA3MUs+JO3mvyGgKmUqCbu/55eP2lPF3oHPP/+c7Ozn2e8iIyNp1aoVFhYWWFtb07t3bw38BnTDnDlzsLS0ZPjw4bo2BRCiKxBojeKIbnZ2Np06dVIvvsXGxqpDkU6fPo2HhweJiYk0e38w9/74Pt98DOlx19Gv5IDjJ+uxaPIBD/f+oG6buHsBMrkeVUeuYPyS39m/fz+//PILAFOnTqV9+/YkJSURExPDmDFjNPAb0D5hYWF8//33rFixoswUahWiKxBoCRsbG1JTU4u0yn3mzBni4uKYN28eJiYmGBoaqhfQXFxcGDp0KHp6elSq+zbZKY9QpeadUlDf3Aazuu8ik+thUrutum12ahJpt85Ssd1QMuQViFNWYNy4cWzcuBF4nr/h7t27xMXF5Rq7PJGens6AAQOYN28eTk5OujZHjRBdgUBLyGQynJ2dizTbjY6OxsXFJc+aYC+WMk+Tnm9tVWWm5dnPi9m05ApDddusJw8gO5uYn/oTtaA3iwe1ZPjw4Tx48ACAuXPnIkkSjRs3pmbNmqxcubLoF1pGmDFjBq6urvTv31/XpuRCLKQJBFokx8VQs2bNAts5OTkRFRVFVlZWgcUYS1qkVM+8MjJ9BU6frEcm16N7XQcW9K6rft/Ozo7ly5cDcPz4cd5++218fX1xdy98Z1ZZ4PTp06xYsYKwsLAy41bIQcx0BQItUtTEN40bN6ZKlSpMmjSJ1NRUlEolJ06ceKmdp515iezQN62EoWs9kg79giJbSXVbE27evMnRo0cB6RYg3QAAEdJJREFU2LJlCzExMcDzMlMymazI5YJ0zbNnz+jfvz8//vhjrqeCsoIQXYFAixR1MU1PT49du3YRGRmJs7Mzjo6ObNq06aV2PRo4ltgW607jkVRZ3Fk6nP91bUiPHj2Ij48HIDQ0lCZNmmBqakqXLl34/vvvcXNzK/FY2mTKlCnUr1+fnj176tqUPBFZxgQCLbJu3Tp2797Nhg0bNNantjJolQeOHDlCQEAA4eHhWFkVLWl5aSCTyUSWMYGgLFAau9K0kUGrPJCcnMygQYP4+eefdSq4hSFEVyDQIqUhuiWtMG34mlWYDgwMpG3btnTq1EnXphSIiF4QCLSIvb09CQkJpKenY2Cguf39OYlZZu25hjIzC4kCVuwlFTIpG5eHlwlo8q7GbNAlf/75J3/++Sfh4eG6NqVQxExXINAi+vr6ODg4EB0drfG+cypMW6ZEoSeT8sygVUFPhl7833Qzvc2941v4/vvvNW6HtklKSmLIkCGsWLECc/OSRXNoEzHTFQi0TI6LoTRiXj0qG3F77WROh/3NkShlnhm0niXVplmzZkycOJHZs2fj5eVFhw4dNG6Ltvjkk0/o1q0b7dq107UpRUKIrkCgZUrDr5vDX3/9Ra1atfBwdcDDNe82VqZO7Nmzh3bt2jF16lT69etHcHAwHh4epWJTafL7779z6tQpwsLCdG1KkRHuBYFAy5Sm6G7fvp1u3boV2q527dps3LiRmTNnMmrUKDp37vzK9du0TUJCAiNHjmT16tWYmJjo2pwiI0RXINAypSW6KpWKnTt30rVr1yK1b9u2Ld999x0rV67E19eX3r17k5WVpXG7SgNJkhgxYgT9+vWjefPmujanWAjRFQi0TGmJbmhoKBUrVuStt94q8jkBAQGMHDmSM2fOkJWVRWBgoMbtKg02bNjAtWvXmDFjhq5NKTZCdAUCLZOTI1fT7Nixo8iz3BeZOHEiLVu2JDs7mz179qhz6pZV4uLiGDt2LGvWrMHQ0FDX5hQbIboCgZZxdnYmLi5OXaVBUxTVn/tfZDIZP/zwAxUrVsTT05PJkycTHBysUds0hSRJDBkyhJEjR9KgQQNdm1MihOgKBFrGwMAAKysr4uLiNNZnREQEjx8/pmHDkuVQ0NPTY/369SQkJNC2bVt69epVKrPxV2XFihXcu3ePKVOm6NqUEiNEVyDQAZr26+7YsYMuXbogl5f8I21sbMyuXbs4f/48LVq0oGvXrkWqcqEt7ty5w2effcaaNWtQKBS6NqfECNEVCHRAaYhuSVwL/8Xa2pp9+/Zx/PhxKleuTL9+/VCpVBqw8NVQqVQMHjyYTz/9lFq1aunanFdCiK5AoAM0Kbr379/n8uXLtGnTRiP9Va1alZ07d3Lx4kVu3brFl19+qZF+X4VFixahVCqZMGGCrk15ZYToCgQ6QJMRDH/88QcdOnTQaAKdRo0aERQURHx8PKtWrVIXrNQFERERzJgxg6CgoHJTvaIghOgKBDqgqGV7isL27dtLFCpWGH5+fsyaNQuZTMbo0aPRRUGD7OxsBgwYwJdfflms+OOyjBBdgUAHaMq9kJKSwtGjR/Hz89OAVS8zdOhQBg0aRMWKFenWrZu6nI+2+PbbbzE2NmbkyJFaHbc0EaIrEOgAFxcXoqKiKKhcVlHYv38/TZo0wdKy9BKRT58+nRYtWmBiYkLXrl1RKpWlNtaLXLp0iW+//ZaVK1e+UlRGWeP1uRKBoBxhamqKkZERCQkJr9SPpqIWCuL/27v/oKrrfI/jz+/hHDksP0QQBYXA7LooibWXdTTN9N7kmilZspuLFmVDaDVt987esZ1ma7aus05NeXU3BUHqlqxtiCs4gIC5YNSyDHVH8hesLgqMIj/khyAHzo/v/YOQa/wQhHO+B3g//tLz4/N9MwwvvnzO5/N5K4rCvn37CAkJobGxkbi4uBH/sriTrq4uYmNj2bFjB8HBwXa9lqNJ6AqhkZFOMVgsFrKysoiKihrFqvpnMBg4dOgQnp6enDhxgnfffdeu19u+fTsBAQFs3rzZrtfRgoSuEBoZaegWFRUREhJCUFDQKFY1ME9PT3JyctDpdOzYsYOjR4/a5TqlpaXs3buXpKQkFGWQtkNjlISuEBoJCQkZ0bIxe61aGExAQAB5eXkoisIzzzzDmTNnRnV8k8nEs88+y65du5gxY8aoju0sJHSF0MhI7nRVVXXIfG5/5s6dS2ZmJjabjVWrVtHY2DhqY7/55puEhYWxYcOGURvT2UjoCqGRkYRuWVkZiqJotiV26dKlfPTRRzQ3N7NmzRrMZvOIx/zqq684cOAAe/bsGZfTCj0kdIXQyEhCt+cuV8twWr9+Pe+88w6nT5/mxRdfHNFY7e3txMbGsmfPHvz8/EapQuckoSuERkYSulrM5/bntdde47nnnuPzzz8fUTv3bdu2sWTJEk2mSxxNugELoREfHx8sFgstLS1Mnjx5yO+rqqqiurraaXqD7dq1i8rKSrZt20ZYWBiPPvrosN5//PhxMjMzKSsrs1OFzkVCVwiNKIpy6wyG8PDwIb8vIyODxx9/HL3eOX58dTod6enpLFy4kCeeeIKysjJmz5596/mGtk4OfVPD+dpWWk0WvIx6Qv29+Nk/B6K3mnjhhRdITk626646Z+Ic3zUhJqie08aGG7qvvPKKHasaPldXVwoKCggLC2PJkiVUVFRQ2WLjw4ILFFZ077rrtPSey2vU17LzeAWebdU8tPYXREZGalW6w0noCqGh4c7rNjU1UVJSwsqVK+1Y1d2ZMmUKxcXFzJs3j0Wb/hPbA+votNjob8ew6fsA7pwUQJtHEAeKL7FpUYhjC9aIfJAmhIaGG7rZ2dmsWLECd3d3O1Z19+655x5+vT+btjmRmMz9B+5tdDpMFhvbs89xoPiSI0rUnISuEBoabug6y6qFgZyqbubAmZvoDIO3RreZO6lL+y1VO39O/Z9/R4fZxvbs85TVNDuoUu1I6AqhoeGErslkIj8/nzVr1ti5qrv3YcEFTJY7t5a/Wf4V1pvNBP3yIH5P/hoAk8XKnoIL9i5RcxK6QmhoOKF74sQJ5s+fz7Rp0+xc1d1paOuksKL+zlMKgKWlDoPPTBRdb/sdVYW/lNfT2NZpxyq1Jx+kCaEhf39/Wlpa6OjowM3NbdDXZmRkOPXUwqFvavo8Zm6opjH3Q7rqKtF7+uL9SCxdtRdo+WsaoHKzopgpj76I54Lu1QsKcOjbGuKXze4z1nghoSuEhnQ6HUFBQVy+fJnQ0NABX2ez2cjMzOTkyZMOrG54zte23rYsTLVaqDv0Nh7hK5m+4R1M1WepP/xfBMTuBMDSfJWpa3912xgmi43zV284tG5Hk+kFITQ2lCmGkpISfHx8nLo5Y6vJctv/O6+cx2buwGtxNIqLAbeQBbjN/intZwvvMM7ID89xZhK6QmhsKKF75MgRpz+XwMt4+x/O1rbr6D39UJTemNFPnoa1bfCjIL2MBrvU5ywkdIXQ2FBC19nncwH+yc8dvdL7KZqLhw+WG/Woau+Ug6W1HhcP3wHHMOp1hAZ42rVOrUnoCqGxO4VueXk5ra2tREREOLCqobt48SKvv/46bz+/Gqu1d7mY64wfozMYaS1OR7VaMF0uo+NCCe7zlg04lgpE/yTQAVVrR0JXCI31HHozkIyMDKKiopyqDbnZbCY9PZ3IyEgWL16MxWLhy/xsVt4/k54jfhUXA37rf0PHP76hencM1/P2MvXxf8fg239PN0WBFT/2w9fD1YFfiePJ6gUhNNZz6M1Ajhw5wltvveW4ggZx6dIlkpKSSElJYc6cOcTHx/PUU09hNHbvQHvZrZkv/95Ah7n7jneSXzD+G3f0Gcf74Y19HjPqXXhp+X32/QKcgPP86hRigpo5cybXrl3rt+VNbW0tZ8+eZfny5Y4v7HsWi4WMjAwee+wxIiIiaG9v54svvqCwsJCYmJhbgQuwIMibN1aH4mYYXrS4GXS8sTqU8MDxf7yj3OkKoTGDwYC/vz81NTXMmjXrtueOHj3KqlWrcHV1/J/c1dXVJCcns3//foKDg4mPj+fw4cN33MTRc1rY9uzzmCzWQXeoKUr3He4bq0MnzCljErpCOIGeD9N+GLoZGRls3Nj3T3F7sVqt5OTkkJiYyNdff01MTAw5OTnMnz9/WONsWhRCeKA3ewou8JfyehR6j3OE7lUKKt1zuC8tv29C3OH2kNAVwgn0t4Khra2NkydPkpqaavfrX7lyhf3795OUlERAQABbtmzhs88+G9ERkuGB3iRsiqCxrZND39Zw/uoNWk1mvIwGQgM8if5J4Lj/0Kw/ErpCOIH+VjDk5uayaNGiYfVPGw6bzUZeXh6JiYkUFhby9NNPk5mZyQMPPDCq1/H1cB3XZykMl4SuEE4gODiY4uLi2x6z1y602tpaUlJSSEpKwtfXl/j4eD799FM8PDxG/VqiL1m9IIQT+OGyMbPZTHZ2NlFRUaMyvs1mIz8/n+joaObOnUtlZSVpaWmUlpYSFxcngetAcqcrhBP44ZxuUVER9957L4GBI9udVVdXx8cff8y+ffvw8PAgPj6elJQUvLy8RlqyuEsSukJorKGtk7xqlRv3P8Xmj0vwcjNQUfK/REatv6vxVFWloKCAxMREcnNzefLJJ0lNTWXhwoUoPdvFhGYkdIXQyKnq5ttalLvNfYQT5d3/RjeLv5sm0XCglJceuY8FQXdeUtXY2HjrrtZgMLBlyxYSEhLw9p44y7HGAgldITRwoPjS4JsH9JMw2yDv7DVOVjQMuHlAVVWKiopITEwkKyuLtWvXkpKSwkMPPSR3tU5KQlcIB+sO3HN0mG13fK2qQofZyvbsc0Dvbq+mpiY++eQTEhMTUVWV+Ph4du/ejY+Pjz1LF6NAQlcIBzpV3cz27PNDCtz/r6dFOU3V5H+WTGZmJqtXryYhIYGHH35Y7mrHEAldIRxoqC3K+9PRaeY3B7/k5fD5vP/++0ydOhWLxSKBO8ZI6AphJ++99x7FxcWkp6cD3asU0na/jRUF76UbuX4iGdPFUlAU3MNX4r00BkXngrnpKo05v8dcVwmKgnHWg/hGbkVn9GBS8IPs/v0WrFYrqamplJeX097ejl4vP8pjhWyOEMJONm3axLFjx2hubgbgT3+7xI2zJ3G//19oyNqJonNhRnwSAc/vxlT5LW2n8r5/p8rkxT8j8JVPmBG3F2trA81FfwS6W5S3d1k5ePAgWVlZNDc3S+COMRK6QthJQEAAy5YtIy0tDYDc3GPo3LzQe/rS8Y9SpvxrHLpJRlzcvfH66Traz3W3VzdMmYHbrAdR9AZcfjQZr4XrMFWdBrpP6jJbbLz66qsEBQXd8ZhF4XzkV6QQdhQbG8vevXuJi4vjdGEW7vevwNJSB1YrNX94tveFqg29lx8A1vZmrh9PpLP6DLauDlBVdMbebbqqCkFB/be8Ec5PQlcIO1q3bh1bt27l9OnT1JwqYtrmD8FFj6I3EPTLP6LoXPq8p6nwfwCFgBf+gIubFzcr/sr1/IRbzysK8uHZGCbTC0LYkdFoJDo6mpiYGGbNW4C7rz96Dx+MIQ/S9EUyts6bqKoNc9NVTFXfAaB2daCbZETn6o7lRgOtfzvcO55eh0EvP7ZjmXz3hLCz2NhYvvvuO16Oe/7WY1PX/AeqzcKV5K1U//cG6v/8O6xt1wGYvOQXdNVepHrn09Sl/ZYfzVl8630q4D6p792xGDsUdZAGRhEREWppaakDyxFi/KmqqiI0NJTa2lp+lVFB/rlrg/YNG4iiwL/Nm07CpojRL1KMKkVRvlFVtd9vlNzpCmFHNpuNDz74gA0bNuDl5cXLy+/DqL+7O9WJ0qJ8vJMP0oSwk/b2dqZPn05wcDDHjh0DeluUD/XshR4TqUX5eCehK4SduLu709bW1udxaVE+sUnoCqEBaVE+cUnoCqERaVE+MUnoCqExaVE+scjqBSGEcCAJXSGEcCAJXSGEcCAJXSGEcCAJXSGEcCAJXSGEcKBBD7xRFKUeuOy4coQQYlwIVlXVr78nBg1dIYQQo0umF4QQwoEkdIUQwoEkdIUQwoEkdIUQwoEkdIUQwoH+D4zO3uFik6AiAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import networkx as nx\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "def create_graphs_of_words(docs, vocab, window_size):\n",
    "    graphs = list()\n",
    "    sizes = list()\n",
    "    degs = list()\n",
    "\n",
    "    for idx,doc in enumerate(docs):\n",
    "        G = nx.Graph()\n",
    "        for i in range(len(doc)):\n",
    "            if doc[i] not in G.nodes():\n",
    "                G.add_node(doc[i])\n",
    "                G.nodes[doc[i]]['label'] = vocab[doc[i]]\n",
    "        for i in range(len(doc)):\n",
    "            for j in range(i+1, i+window_size):\n",
    "                if j < len(doc):\n",
    "                    G.add_edge(doc[i], doc[j])\n",
    "        \n",
    "        graphs.append(G)\n",
    "    \n",
    "    return graphs\n",
    "\n",
    "\n",
    "# Create graph-of-words representations\n",
    "G_train_nx = create_graphs_of_words(train_data, vocab, 3) \n",
    "G_test_nx = create_graphs_of_words(test_data, vocab, 3)\n",
    "\n",
    "print(\"Example of graph-of-words representation of document\")\n",
    "nx.draw_networkx(G_train_nx[3], with_labels=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will next utilize the *graph_from_networkx()* function of GraKeL to convert the NetworkX graphs to objects that can be handled by GraKeL. Then, we will initialize a Weisfeiler Lehman subtree kernel and use it to construct the two kernel matrices (i.e., train and test). Next, we will train an SVM classifier and use it to make predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy: 0.858\n"
     ]
    }
   ],
   "source": [
    "from grakel.utils import graph_from_networkx\n",
    "from grakel.kernels import WeisfeilerLehman, VertexHistogram\n",
    "from sklearn.svm import SVC\n",
    "from sklearn.metrics import accuracy_score\n",
    "\n",
    "# Transform networkx graphs to grakel representations\n",
    "G_train = list(graph_from_networkx(G_train_nx, node_labels_tag='label'))\n",
    "G_test = list(graph_from_networkx(G_test_nx, node_labels_tag='label'))\n",
    "\n",
    "# Initialize a Weisfeiler-Lehman subtree kernel\n",
    "gk = WeisfeilerLehman(n_iter=1, normalize=False, base_graph_kernel=VertexHistogram)\n",
    "\n",
    "# Construct kernel matrices\n",
    "K_train = gk.fit_transform(G_train)\n",
    "K_test = gk.transform(G_test)\n",
    "\n",
    "# Train an SVM classifier and make predictions\n",
    "clf = SVC(kernel='precomputed')\n",
    "clf.fit(K_train, y_train) \n",
    "y_pred = clf.predict(K_test)\n",
    "\n",
    "# Evaluate the predictions\n",
    "print(\"Accuracy:\", accuracy_score(y_pred, y_test))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
